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Relatively  little  attention  has  been  paid  to  the  problem  of  measuring  the 
efficiency  of  graph  algorithms.   The  fact  that  the  amount  of  work  required  by 
most  graph  algorithms  varies  greatly  and  unpredictably  with  the  structure  of  the 
graph  to  which  it  is  applied,  makes  this  problem  both  practically  important  and 
theoretically  difficult. 

Two  major  goals  were  set  at  the  outset  of  this  investigation:   first,  to 
investigate  and  develop  general  approaches  and  specific  techniques  for  analyzing 
the  efficiency  of  graph  algorithms,  and  second,  to  test  and  illustrate  some  of 
these  approaches  and  techniques  by  using  them  for  the  analysis  and  comparison  of 
specific  algorithms. 

With  respect  to  the  first  goal,  empirical  and  analytical  methods  are  dis- 
cussed.  The  use  of  empirical  methods  is  greatly  facilitated  by  a  Graph  Algorithm 
Software  Package,  GASP,  which  is  an  extension  of  PL/1  and  has  sets  and  graphs  as 
additional  data  types. 

With  respect  to  the  second  goal,  the  problem  of  finding  all  the  spanning 
trees  of  a  graph  was  chosen.   All  published  algorithms  are  analyzed  and  compared. 
A  new  algorithm  is  described,  analytically  compared  to  the  previous  algorithms, 
and  found  to  be  superior.   For  example,  on  the  complete  graph  on  n  nodes,  cost 
(new  algorithm)  =  cost  (A) / (/z        ),  where  A  is  the  most  efficient  previous 
algorithm. 
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1.    INTRODUCTION 

1.1.   Goals  of  This  Investigation 

Graph  theory  and  Its  applications  have  received  much  attention  over  the 
past  two  decades.   In  particular,  many  algorithms  have  been  proposed  for  the 
solution  of  several  graph  problems  which  arise  frequently  In  certain  applications. 
However,  relatively  little  attention  has  been  paid  to  the  problem  of  measuring 
the  efficiency  of  these  proposed  algorithms.   The  fact  that  the  amount  of  work 
required  by  most  graph  algorithms  varies  greatly  and  unpredictably  with  the  struc- 
ture of  the  graph  to  which  it  is  applied,  makes  this  problem  both  practically  im- 
portant and  theoretically  difficult. 

Two  major  goals  were  set  at  the  outset  of  this  investigation:   first,  to 
investigate  and  develop  general  approaches  and  specific  techniques  for  analyzing 
the  efficiency  of  graph  algorithms,  and  second,  to  test  and  illustrate  some  of 
these  approaches  and  techniques  by  using  them  for  the  analysis  and  comparison  of 
specific  algorithms. 

With  respect  to  the  first  goal,  there  are  two  major  categories  of  methods 
for  measuring  the  efficiency  of  graph  algorithms:   empirical  and  analytical. 
Empirical  methods  are  discussed  in  chapter  2,  analytical  methods  in  chapter  3. 
Empirical  methods  are  greatly  facilitated  by  a  Graph  Algorithm  Software  Package, 
GASP,  which  is  described  in  detail  in  appendix  1. 

With  respect  to  the  second  goal,  one  graph  problem,  that  of  finding  all  the 
trees  of  a  given  graph,  was  chosen.   It  is  discussed  in  chapter  4.   There  are  many 
published  algorithms  for  this  problem,  and  these  are  described  in  chapter  5.   By 
concentrating  on  these  algorithms,  useful  techniques  for  efficiency  analysis  were 
developed  and  tested.   Such  detailed  study  also  led  to  the  development  of  a  new 
algorithm  for  finding  all  the  trees  in  a  graph.   This  new  algorithm  is  analytically 
compared  to  the  best  among  the  known  algorithms  in  chapter  6  and  is  found  to  be 
superior  to  them. 


1.2,  Related  Ef forte 

Previous  effort!  which  share  some  of  the  goals  of  this  investigation  fall 
mainly  Into  two  categories:   first,  the  analysis  of  specific  algorithms,  and 
second,  the  development  of  general  purpose  graph  aoftware.  A  brief  review  of 
some  of  the  most  relevant  papers  follow. 

Authors  of  spanning  tree  algorithms  sometimes  present  data  on  the  perfor- 
mance of  their  algorithms  ([Dawson  68],  [Stehman  69]).  This  approach  suffers 
from  the  fact  that  one  cannot  deduce  the  efficiency  of  an  algorithm  from  its 
performance  on  isolated  examples.  A  systematic  comparison  of  seven  algorithms 
on  13  graphs  was  done  by  Fernandez  la  his  thesis  [Fernandez  69a],  and  is  mentioned 
in  an  abstract  [Fernandez  69b]. 

Notable  among  the  analyses  of  other  graph  algorithms  are  Gotlieb  and  Cornell's 
experiments  with  algorithms  for  finding  a  fundamental  set  of  circuits  ([Gotlieb  67], 
see  also  [Paton  69]),  Shirey's  analysis  of  algorithms  for  testing  the  planarity  of 
graphs  [Shirey  69],  and  Cornell  and  Gotlieb's  analysis  of  an  algorithm  for  testing 
graph  isomorphism  [Cornell  70] . 

The  second  category  consists  of  papers  which  describe  languages  for  graph 
processing  ([Friedman,  69],  [Hart,  69],  [Read   ],  [Wolfberg,  69]).  These  lan- 
guages and  GASP  are  similar  in  the  sense  that  they  all  include  graphs  and  sets  as 
data  types,  and  they  all  are  extensions  of  an  existing  base  language  (e.g.,  FORTRAN, 
LISP).   GASP  is  the  only  one  which  is  an  extension  of  PL/1,  the  richest  widely 
available  programming  language. 

1.3.  Notation 

Throughout  this  thesis,  "n"  will  stand  for  the  number  of  nodes  in  a  given 
graph;  "bM  will  stand  for  the  number  of  branches;  "t"  will  stand  for  the  number 
of  spanning  trees;  Mc"  will  stand  for  the  time  cost  of  an  algorithm. 

Upper  bounds  on  c  will  be  expressed  as  c  «  0(f(n,b,t)),  which  means  that 
there  exists  a  constant  A  such  that  c  <  A  •  f(n,b,t)   for  sufficiently  large 
n,  b,  and  t.   Similiarly,  lower  bounds  will  be  expressed  as  f(n,b,t)  ■  0(c), 


which  implies  that  there  exists  a  constant  A  such  that  c  £  A  *  f(n,b,t)   for 
sufficiently  large  n,  b,  and   t. 

The  cardinality  of  a  set  s  will  be  denoted   |s|.   The  symmetric  difference 
(exclusive  or)  of  sets  si  and  s2  will  be  denoted  si  •  s2. 

Truth  values,  YES  and  NO,  will  be  combined  using  "&",  "or",  and  "-.". 

A  graph  G  consists  of  a  set  of  words   {v.,  v« v  }  and  a  set  of 

branches   {e  ,  ...,  e,  }.   The  set  of  branches  incident  to  v   will  be  denoted 
B..   The  degree  of  a  node,  "degree  (v  )",  is  equal  to   JB  |. 


2.    EMPIRICAL  METHODS  OF  MEASURING  EFFICIENCY  OF  COMPUTATION 

There  are  several  methods  which  could  be  used  to  measure  the  efficiency  of 
graph  algorithms.   These  methods  tend  to  fall  into  two  categories,  empirical  and 
analytical.   Of  course,  some  methods  have  both  empirical  and  analytical  features, 
but  the  division  into  categories  is  still  useful  in  order  to  understand  general 
principles. 

Empirical  methods  consist  of  implementing  the  algorithm  on  a  computer, 
running  several  tests  on  it,  measuring  the  cost,  and  drawing  some  form  of  con- 
clusion from  the  observed  data.   This  approach  has  several  advantages  and  dis- 
advantages. 

2.1.  Advantages 

The  first  advantage  is  that  empirical  measures  are  often  easier  to  obtain 
than  analytic  measures.   This  is  especially  true  if  one  needs  the  implemented 
algorithm  to  solve  problems.   Obtaining  data  is  relatively  trivial.   Interpreting 
the  data  may  be  easy  (e.g.,  if  one  only  wants  to  compare  algorithms  qualitatively 
to  find  out  which  algorithm  is  best),  or  may  be  very  difficult  (e.g.,  if  one  wants 
a  quantitative  prediction  of  the  cost  on  graphs  which  have  not  been  tested). 

A  second  advantage  occurs  after  a  graph  algorithm  has  been  implemented: 
the  programmer  often  sees  ways  to  improve  its  efficiency  (both  on  a  programming 
level  and  a  graph  theoretical  level).   Insights  into  measuring  the  efficiency 
may  occur  as  well. 

Finally,  empirical  results  produce  numbers  corresponding  to  actual  run 
times  which  may  prove  to  be  more  useful  than  analytically  derived  formulas  which 
often  yield  only  rates  of  growth. 

2.2.  Disadvantages 

One  major  disadvantage  of  experimental  testing  of  efficiency  of  graph 
algorithms  is  that  the  run  time  of  an  implemented  program  depends  on  many  factors 


/hich  have  little  or  nothing  to  do  with  the  algorithm  proper.   These  factors 
Include  the  particular  computer,  language,  and  programmer;  the  implementation; 
ind  the  method  of  representing  graphs.   A  change  in  some  of  these  factors  could 
reverse  the  experimental  conclusion  of  the  superiority  of  one  algorithm  over 
mother.   Similarly,  once  the  machine  on  which  they  were  obtained  becomes  ob- 
solete, experimental  results  are  likely  to  lose  their  value. 

The  other  major  disadvantage  of  experimental  testing  is  that  because  of 
:omputer  time  costs,  only  a  small  number  of  tests  can  be  run.   If  the  amount  of 
:omputation  required  by  the  algorithm  is  sensitive  to  the  structure  of  the  graph, 
it  becomes  very  difficult  to  accurately  extend  the  results  of  tests  on  a  small 
number  of  graphs  to  the  class  of  all  graphs. 

Similarly,  if  the  algorithm  requires  computation  time  which  increases  rapidly 
rfith  increasingly  large  graphs,  experimental  measures  will  be  limited  to  tests  on 
small  graphs.   For  tree-finding  programs,  15-node  graphs  may  be  too  large  [Dawson 
68] .   Costs  of  algorithms  applied  to  small  graphs  usually  will  permit  only  very 
poor  extrapolations  to  the  costs  of  larger  graphs. 

Many  authors  hide  the  inefficiency  of  their  algorithms  by  illustrating  them 
on  small  graphs  where  they  appear  reasonable.   When  applied  to  slightly  larger 
graphs,  the  algorithms  require  considerably  more  computation. 

2.3.   Lessening  the  Disadvantages  by  Using  Better  Measuring  Techniques 

The  two  disadvantages  mentioned  in  the  previous  section  are  due  in  varying 

degrees  to  the  use  of  data  consisting  of  computer  run  times.   In  order  to  obtain 

data  dependent  on  properties  of  the  algorithm  rather  than  on  the  particular 

i 

computer  system  used,  the  following  technique  can  be  used. 

Divide  the  given  program  into  logical  groups  of  operations  which  have  the 

i 

property  that  during  any  test  of  the  program,  all  the  operations  in  a  section 

I 

[will  be  executed  the  same  number  of  times.   Insert  counters  into  the  program, 

jone  to  each  logical  section.   Assign  weights  to  each  operation,  and  compute  the 

\ total  weight  of  a  section  as  the  sum  of  the  weights  of  all  the  operations  in  the 


section.  Take  the  section  counts  from  a  computer  test,  multiply  them  by  the 
corresponding  weights,  sum  over  all  sections,  and  you  have  the  total  cost  of 
that  test  graph. 

This  technique  decreases  the  dependence  on  the  particular  computer  system 
used  because  one  can  arbitrarily  assign  weights  to  operations  in  a  manner  con- 
sistent with  any  imaginable  computer  system.  The  cost  then  corresponds  to  the 
run  time  on  the  imaginary  computer,  perhaps  quite  different  from  the  real  time 
cost  of  the  test  run.  Furthermore,  many  different  costs  (based  on  different 
imaginary  machines)  can  be  computed  at  the  cost  of  just  one  real  test.  To  achieve 
this,  simply  save  the  section  counts  and  change  the  weight  system. 

This  technique  may  increase  slightly  the  size  of  test  graphs  which  can  be 
directly  measured.   Once  the  program  has  been  debugged,  any  code  which  does  not 
affect  the  flow  of  the  program  can  be  removed,  reducing  the  real  time  required 
for  a  test  without  changing  the  computed  cost. 

GASP  is  very  useful  when  the  above  technique  is  applied.   GASP  allows 
programmers  to  express  graph  and  set  operations  in  natural  terms,  without  regard 
to  how  these  objects  are  represented.   Similarly,  the  operations  on  these  objects 
are  expressed  independently  of  their  implementation.   Assigning  a  reasonable  set 
of  weights  to  GASP  operations  is  easy. 

Because  programs  written  in  GASP  are  independent  of  the  representations, 
it  is  possible  to  run  the  same  program  with  many  different  versions  of  GASP, 
thereby  obtaining  experience  with  different  representations.   GASP  is  structured 
so  that  small  changes  can  be  made  in  some  GASP  routines  and  data  structures  without 
requiring  changes  in  the  routines  which  use  them. 


3.    ANALYTICAL  METHODS  OF  MEASURING  EFFICIENCY  OF  COMPUTATION 

In  contrast  to  empirical  methods,  analytical  methods  involve  the  mathematical 
analysis  of  the  computational  structure  of  algorithms.  This  approach  also  has  its 
relative  advantages  and  disadvantages. 

3.1.  Advantages 

First,  analytical  results  hold  for  arbitrarily  large  graphs,  where  experi- 
mental results  would  have  to  be  extrapolated.   Thus  analytical  results  give  a 
better  indication  of  the  true  nature  of  the  algorithm. 

Second,  analytical  measures  are  usually  performed  on  the  algorithm  proper 
rather  than  a  machine-dependent  implementation  of  the  algorithm.  Thus  the  re- 
sults will  not  become  obsolete  when  implementations  improve. 

3.2.  Disadvantages 

The  big  disadvantage  with  the  analytical  approach  is  that  many  graph  al- 
gorithms are  difficult  to  measure  analytically,  especially  when  the  cost  of  the 
algorithm  varies  greatly  with  the  structure  of  the  graph  (and  not  just  its  size). 
The  goal  of  analytical  methods  is  to  express  the  cost  in  terms  of  a  few  easily 
calculated  parameters  of  the  input  graphs.   For  some  algorithms,  this  goal  is 
unobtainable,  and  one  must  do  at  least  one  of  the  following: 

1.  Restrict  the  estimates  and  bounds  to  apply  only  to  some 
subset  of  the  set  of  all  graphs. 

2.  Introduce  more  complicated  parameters. 

3.  Accept  larger  measurement  errors. 

Another  possible  disadvantage  of  analytical  measures  is  that  they  are  derived 
for  large  graphs,  so  that  small  terms  and  details  can  be  ignored.   However,  if 
for  some  reason  the  algorithm  is  applied  only  to  small  graphs,  the  ignored 
information  may  be  more  important  than  the  derived  formula. 


8 
3-3»  Types  of  Analysis 

There  are  several  techniques  which  can  he  used  in  making  analytical 

measures  of  efficiency.  These  techniques  will  he  illustrated  "by  applying 

them  to  an  algorithm,  A,  of  the  following  structure . 
A:   "Pick  an  arbitrary  node  XQ. 

For  all  nodes  X  adjacent  to  X0,  do  S . " 
S  is  an  operation  whose  cost  is  large  hut  constant,  so  that  the  total  cost 

of  A  is  determined  "by  the  number  of  executions  of  S. 

Some  of  the  techniques  will  he  more  significantly  used  (and  therefore 

illustrated) in  chapter  6. 

A  standard  technique  for  measuring  an  algorithm's  efficiency  is  worst- 
case  analysis .   If  applied  to  algorithm  A,  the  following  analysis  might  take 
place.   "The  bound  variable  X  takes  its  values  from  the  set  of  nodes  of  the 
graph;  therefore,  n  is  a  bound  for  the  number  of  times  S  is  executed. 

Hence,  c  =  0(n) ." 

This  method  is  usually  the  easiest  to  apply,  but  usually  the  least 

accurate.   If  an  algorithm  has  c  =  0(n  )  with  k  very  small  (say  2  or  3), 

then  worst-case  analysis  may  be  accurate  for  some  graphs.  However,  for 

less  efficient  algorithms  or  for  typical  graphs,  the  errors  can  grow 

rapidly  and  often  become  intolerable. 

In  order  to  get  bounds  which  are  tighter  than  those  from  worst-case 

analysis,  it  is  usually  necessary  to  make  assumptions.  That  is,  the  test 

graphs  are  assumed  to  have  certain  properties.  For  example,  assume  all 

nodes  have  the  same  degree,  d  (which  could  be  either  a  constant  or  some 

small  function  of  n) .  Algorithm  A  would  be  analyzed  as  follows:   "X  will 

take  on  d  values  because  that  is  exactly  the  number  of  nodes  adjacent  to 

XQ.  Therefore,  c  =  0(d)." 

Assumptions  should  be  chosen  with  care.   Too  many  will  make  the 

analysis  easy,  but  the  conclusions  will  be  of  limited  use.  Too  few  may 
weaken  the  analysis  so  that  only  very  loose  bounds  can  be  obtained. 


Particularly  useful  assumptions  are  those  which  specify  the  test  graphs  in 
terms  of  one  or  more  parameters.  With  such  assumptions,  analytic  bounds  can  be 
derived  and  expressed  in  terms  of  the  parameters. 

For  many  algorithms,  a  useful  one-parameter  family  of  test  graphs  is  the 
complete  graph  on  n  nodes.   Complete  graph  analysis  of  Algorithm  A  would  be 
as  follows:   "X.   is  adjacent  to  all  of  the  other  n-1  nodes;  therefore, 
c  =  O(n-l)." 

Other  possible  examples  of  parameterized  classes  of  graphs  include  circuits 
of  n  nodes,  ladders  of   r  rungs,  star  graphs  of  b  branches,  rectangular  grids 
of  r   rows  and  c   columns,  and  others  of  even  more  parameters. 

In  addition  to  making  the  analysis  easier,  assumptions  may  be  chosen  in  a 
way  that  reflects  the  intended  use  of  the  algorithm.   For  example,  if  the  applica- 
tion is  in  electrical  network  theory,  assumptions  such  as  planarity  or  bounded 
degree  of  nodes  may  reflect  physical  limitations  of  the  hardware. 

The  main  disadvantage  of  these  techniques  is  that  the  assumptions  restrict 
the  set  of  graphs  for  which  the  conclusions  are  valid.   It  is  possible  that  the 
conclusions  will  be  false  for  most  graphs.   This  disadvantage  is  lessened  when 
estimates  which  are  derived  on  a  small  class  of  graphs  can  be  used  as  bounds  on  a 
larger  class.   For  example,  if  the  cost  of  an  algorithm  increases  whenever  a  non- 
parallel  branch  is  added  to  the  test  graph,  then  the  cost  of  that  algorithm  on 

the  complete  graph  on  n  nodes  will  be  an  upper  bound  of  the  cost  on  any  graph 

I 
on  n  nodes . 

When  the  task  is  to  compare  two  or  more  algorithms  and  to  determine  which 
I  one  is  best,  there  are  two  approaches  which  can  be  used.   The  first  approach  is 

!  to  apply  the  previously  discussed  techniques  on  each  algorithm  individually,  and 

i 

:  then  compare  the  derived  estimates  and  bounds.   The  second  approach  is  to  analyze 
|  directly  the  computational  aspects  of  the  differences  between  the  competing 
;  algorithms. 

To  illustrate  the  second  approach,  suppose  Algorithm  B  is  obtained  by 

I 
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modifying  Algorithm  A  so  that  XQ  is  chosen  to  be  a  node  of  minimum  degree. 
Then  the  comparison  analysis  may  be  as  follows:   "If  the  computation  required  in 
B  to  find  a  minimum  degree  XQ  is  negligible,  then  Algorithm  B  is  better  than 
Algorithm  A  because  S  is  executed  fewer  times." 

One  advantage  of  direct  comparison  is  that  the  analysis  is  often  easier, 
thus  fewer  (if  any)  assumptions  will  be  required.  With  fewer  assumptions,  the 
conclusions  will  be  valid  for  a  larger  set  of  graphs  (perhaps  all  graphs). 

Another  advantage  is  that  the  inefficient  parts  of  the  algorithms  are  pin- 
pointed.  Such  knowledge  about  the  parts  would  be  useful  if  it  is  possible  to 
recombine  the  parts  into  new  algorithms,  or  if  analogous  parts  appear  in  another 
pair  of  algorithms. 

A  disadvantage  of  direct  comparison  is  that  numerical  bounds  for  individual 
algorithms  are  not  automatically  produced.   A  related  disadvantage  is  that  this 
method  cannot  be  used  on  an  algorithm  which  has  nothing  in  common  with  other 
algorithms. 
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4.    THE  PROBLEM  OF  FINDING  ALL  THE  SPANNING  TREES  IN  A  GRAPH 

4.1.  The  Problem:   Its  Variations  and  Applications 

In  order  to  make  meaningful  comparisons  among  graph  algorithms,  it  is 
useful  to  focus  on  a  single  graph  problem.   For  this  thesis,  the  chosen  problem 
is  that  of  finding  (i.e.,  listing  exactly  once)  all  the  spanning  trees  of  a 
connected  undirected  graph.   A  [spanning]  tree  is  a  set  of  [n-1]  branches  which 
are  connected  and  contain  no  circuits. 

There  are  several  variations  of  the  problem,  including  the  following: 

1.  count  the  number  of  trees  in  any  given  graph  [see  section  5.2]; 

2.  find  formulas  for  the  number  of  trees  in  special  graphs  ([Bercovici 
69],  [Cayley  89],  [Mullin  67],  [Myers  65],  [O'Neil  66b],  [Riordan 
60]); 

3.  find  all  spanning  trees  of  a  directed  graph  ([Chen  66b,  67], 
[Paul  67]); 

4.  find  all  spanning  trees  common  to  two  related  graphs  ([Ardon  69], 
[Mayeda  66,  68],  [Stehman  69]); 

5.  find,  for  a  given  (directed  or  undirected)  graph,  all  k-trees  (which 
span  k  specified  components),  or  all  co-trees  (complements  of  trees), 
or  all  sets  satisfying  certain  conditions  ([Berger  68],  [Chen  65,  66a, 
69a,  69d],  [Dunn  68],  [Mayeda  57],  [Paul  67]); 

6.  find  two  spanning  trees  with  minimal  intersection  [Chase    ]; 

7.  find  all  rooted  ordered  trees  of  the  complete  graph  [Scions  68], 

Only  spanning  trees  on  undirected  graphs  will  be  considered  for  the  remainder 
of  this  thesis,  so  the  following  conventions  will  be  used.   The  term  "tree"  will 
mean  spanning  tree,  "graph"  will  mean  undirected  graph,  and  "finding  all  trees" 
will  mean  listing  all  the  trees  of  a  given  graph  without  duplications.   Factoring 
the  trees  into  (unions  of)  cartesian  products  is  allowed;  the  applications  (see 
below)  can  use  answers  in  this  form  ([Bedrosian  62],  [Chen  69a],  [Dunn  68]). 
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The  primary  application  of  finding  all  trees  is  in  the  analysis  of  linear 
electrical  networks  ([Hakimi  66],  [Stehman  69],  [Weinberg  58]).  A  second  appli- 
cation is  in  the  analysis  of  multilevel  masers  [Bedrosian  62].   Other  potential 
applications  have  been  mentioned. 

4.2.  The  Algorithms:   Their  Common  Features 

At  least  ten  distinct  algorithms  for  finding  all  trees  have  been  proposed 
in  the  vast  literature  on  this  subject.   In  addition  to  their  large  number, 
these  algorithms  have  other  properties  which  make  them  highly  desirable  objects 
for  efficiency  measurements. 

One  property  of  these  algorithms  is  that  the  cost,   c   (as  well  as  the 
number  of  answers,   t)  grows  exponentially  with  the  size  of  the  test  graph. 
Exponential  algorithms  are  desirable  as  objects  of  efficiency  measurements 
(both  empirical  and  analytical)  because  the  large  growth  rates  magnify  the 
differences  between  algorithms.   Thus  the  inferiority  of  a  bad  algorithm  will  be 
apparent  even  on  small  graphs.   Competing  exponential  algorithms  usually  have 
a  variety  of  growth  rates,  allowing  analytical  measurements  to  determine  the 
most  efficient  algorithm,  because  only  the  growth  rates  of  the  costs  of  algorithms 

are  considered  in  analytical  measurements.   Examples  of  competing  algorithms  which 

3 

cannot  be  analytically  contrasted  because  they  share  a  common  growth  rate   [n  ] 

are  the  better  algorithms  for  testing  the  planarity  of  graphs  [Shirey  69]. 

Although  different  ideas  for  exponential  algorithms  can  be  contrasted  by 
analytical  measurements,  differences  in  graph  representation  and  differences  in 
implementation  efficiency  do  not  show  up.   If  an  algorithm  is  more  efficient  in 

a  particular  representation,  it  will  always  pay  to  convert  the  input  graph  into 

2 
that  representation  because  the  cost  of  conversion   is   0(n  )  which  will  be 

small  when  added  to  an  exponential  term.   Similarly,  implementation  improvements 

can  do  no  better  than  to  reduce  the  cost  by  a  constant  factor,  which  will  not 

affect  growth  rates. 

Another  property  of  these  algorithms  is  that  analytical  bounds  are  difficult 
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to  derive  (this  explains  the  complete  lack  of  meaningful  bounds  by  the  many 
authors  of  these  algorithms).   One  of  the  reasons  for  this  difficulty  is  that 
the  cost  of  these  algorithms  depends  greatly  on  the  parameter  t,  which  cannot 
be  expressed  in  terms  of  n  and  b   (except  for  a  few  special  graphs).   Fortun- 
ately,  the  scarcity  of  individual  bounds  in  terms  of  n  and  b  does  not  rule 
out  comparison  analysis.   For  example,  any  algorithm  whose  cost  grows  faster  than 
t  will  be  inferior  to  any  algorithm  whose  cost  grows  slower  than  t. 

A  property  of  these  algorithms  which  aids  direct  comparison  analysis  is  that 
many  of  them  can  be  arranged  in  a  sequence  in  which  the  difference  between  one 
algorithm  and  the  next  is  small.   This  property  aids  both  the  description  and  the 
analysis  of  the  algorithms  because  only  the  differences  need  to  be  described  and 
analyzed. 
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5.    DESCRIPTION  OF  THE  ALGORITHMS 

This  chapter  briefly  describes  all  known  classes  of  algorithms  for  finding 
all  trees  in  a  graph.   Sections  5.1  through  5.4  are  independent  of  each  other; 
section  5.5  describes  a  variation  of  the  algorithm  in  section  5.4;  section  5.6 
introduces  the  remaining  algorithms,  each  of  which  is  described  in  terms  of  the 
differences  from  the  preceding  algorithm. 

5.1.  Exhaustion 


Exhaustion  algorithms  simply  search  through  a  large  set  of  candidate 
branch  sets,  testing  each  to  see  if  it  is  a  tree  of  the  graph. 

One  algorithm  ([Hale  61],  [MacWilliams  58],  [Mason  57],  [Mayeda  57], 
[Weinberg  58])  generates  all  sets  of  n-1  branches  from  the  graph,  and  tests 
each  set  to  see  if  it  is  a  tree. 

Another  algorithm  ([Char  68],  [Zobrist  64])  takes  a  previously  computed 
list  of  all  the  trees  on  the  complete  graph  on  n  nodes,  and  tests  each  tree 
to  see  if  all  of  its  branches  belong  to  the  input  graph. 

5.2.  Determinants 


The  most  efficient  method  to  calculate   t,   the  number  of  trees  in  a 
graph,  is  to  evaluate  a  determinant  [Harary  59],   Let  M  be  a  n-1  by 
n-1  matrix  with  entries  m..   defined  as  follows:   m. .  =  degree  (v.),   and 
(for   i*j )   m. .  =  -  (the  number  of  branches  connecting  v.   to  v.).   Then, 
t  =  det (M)   can  be  calculated  by  using  any  standard  method  of  evaluating 
determinants  (e.g.,  Gaussian  Elimination). 

Determinant  algorithms  ([Trent  54],  [Weinberg  58])  to  find  all  trees 
need  to  evaluate  determinants  symbolically,  a  complicated  (and  costly)  process. 
Some  of  the  more  efficient  "determinant"  algorithms  ([Chang  68]),  [Chen  68], 
[Malik  67],  [Nakagawa  58])  turn  out  to  be  different  presentations  of  algorithms 
to  be  described  later  (sections  5.4,  5.7,  and  5.10). 
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5.3.  Decomposition 

Many  authors  ([Berger  68],  [Chen  69a,  69b],  [Hakimi  64],  [Jong  66], 
[Kim  60],  [Lee  63],  [MacWilliams  58],  [Mayeda  59],  [Myers  67],  [Row  61], 
[Watanabe  61])  have  suggested  decomposition  as  a  method  to  find  all  trees. 
The  basic  idea  is  to  divide  the  graph  into  two  or  more  subgraphs,  to  find  the 
trees  on  these  subgraphs,  and  then  to  combine  these  partial  trees  into  trees 
of  the  input  graph.   Unlike  most  decomposition  algorithms  for  other  graph 
problems  (e.g.,  planarity  tests),  the  final  step  of  combining  the  partial 
answers  is  not  trivial. 

There  are  other  difficulties  in  constructing  decomposition  algorithms. 
Only  a  few  algorithms  ([Chen  69a,  69b],  [Kim  60],  [Mayeda  59],  [Myers  67]) 
avoid  duplication  and  its  penalty  of  checking  each  tree  against  the  list  of 
trees.   Some  algorithms  ([Chen  69a],  [Lee  63],  [MacWilliams  58],  [Myers  67]) 
can  be  applied  only  to  special  types  of  graphs. 

Apparently  only  one  of  these  algorithms  [Chen  69b]  is  general,  avoids 
duplications,  and  overcomes  some  of  the  difficulties  of  combining  partial  trees. 
Like  most  of  the  references  to  decomposition  algorithms,  significant  details 
are  not  specified,  so  no  algorithm  will  be  described  (or  analyzed)  here.   If 
the  details  could  be  worked  out,  a  decomposition  algorithm  might  be  competitive 
with  the  best  of  the  existing  algorithms. 

5.4.  Tree  Transformations 
Several  algorithms  ([Chen  69c],  [Fujisawa  59],  [Hakimi  61,  66],  [Kishi 

69],  [Malik  67],  [Mayeda  65,  66,  68],  [Stehman  69],  [Watanabe  60],  [Wing  63]) 
are  based  on  "elementary  tree  transformations".   Tree  Y..   is  transformed  by 
adding  any  new  branch  a„   and  removing  any  branch  a.,   which  lies  in  the  path 
connecting  the  endpoints  of  a?.   The  new  tree  Y  *  Y-  9   {a..,a9}. 
For  any  two  trees,   Y   and  Y,   there  is  a  sequence  of  trees 

<Y0*  Yl»  Y2'  ' "'  Yk-1'  Yk  =  Y>  such  that  for  1  "  *"'  "  k'  Y#   is  an  elementary 
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tree  transformation  of  Y.  ..  The  "distance"  from  YQ  to  Y,   denoted 
d(Y-,Y),   is  the  minimum  number  of  transformations  necessary  to  change  Y.   into 
Y.   For  all  Y  and  YQ,   d(YQ,Y)  ^  n-1. 

A  tree  transformation  algorithm  begins  with  an  initial  tree  Y_.   First, 
all  possible  elementary  tree  transformations  are  applied  to  Y_  to  get  X1 , 


the 


set  of  all  trees  at  distance  1  from  Y_.  Next,  X?,   the  set  of  all  trees 


at  distance  2  from  Y  ,  is  found  by  applying  elementary  transformations  to  the 
trees  in  X..  .   Similarly,  X^  is  found  from  X-  by  elementary  transformations. 
This  process  continues  until  X   is  found  where  r  =  max[over  Y]d(Y_,Y). 

The  details  of  this  algorithm,  such  as  how  to  avoid  duplications,  will 
not  be  described  here  (see  [Mayeda  65]). 

On  some  graphs,  the  choice  of  Y~  can  make  a  big  difference  in  the  cost 
of  the  algorithm.   The  best  Y   is  a  "central"  tree  ([Deo  66],  [Malik  68]), 
for  which  max [over  Y]d(Yft,Y)   is  a  minimum.   One  algorithm  for  finding  a 
central  tree  has  been  suggested  [Amoia  69]. 

5.5.  Hamiltonian  Paths 

The  trees  of  any  graph  can  be  arranged  in  a  (Hamiltonian  Path)  sequence 
Y  ,  Y2,  ...,  Y   such  that  for   1  <  i  <  t-1,  d(Y±>Y±+1)    =  1   ([Chen  67], 
[Cummins  66],  [Shank  68]).   Algorithms  to  find  trees  in  such  an  order  have 
been  suggested  ([Kamae  67],  [Kishi  67,  68]).   These  algorithms  will  not  be 
described  here  because  they  are  too  complicated. 

5.6.  Introduction  to  Expansion  Algorithms 

The  remaining  algorithms  (sections  5.7  through  5.12)  expand  the  "variable 

Cartesian  Product"  X,  *   X„  x  . . .  *  X  n,   where  the  definition  of  the  set  X. 

1    2  n-1  j 

depends  on  the  choices  of  elements  from  X..   through  X  -  .   The  basic  flowchar 
for  these  algorithms  appears  in  figure  1,  and  will  be  explained  in  detail  in 
this  section.   Subsequent  flowcharts  will  be  described  by  explaining  the 
changes  in  the  contents  of  boxes  1  through  4. 
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Figure  1:   Expansion  Algorithms 
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The  two  interlocking  loops  in  the  basic  flowchart  are  roughly  equivalent 
to  n-1  nested  loops  (fixed  nested  loops  cannot  be  used  because  n  is  a 
variable).   The  variable  j   specifies  the  nesting  level.   The  "highest"  level 
is  1,  the  "lowest"  level  is  n. 

Each  level  j   (1  <  j  £  n-1) ,  begins  (box  2)  with  the  calculation  of 

X.,   a  set  of  branches.   X   controls  the  iterations  at  level  j.   Namely,  an 

iteration  begins  (box  3)  with  one  branch  being  picked  (and  deleted)  from  X 

and  being  assigned  to  the  bound  variable  a.. 

At  the  lowest  level,  the  set   {a.,  a_,  ...,  a  , }   is  processed  as  a  tree 

12        n-1 

candidate  (box  4). 

When  computation  at  a  level  j   is  completed,  the  algorithm  "backtracks" 
(box  5)  to  the  previous  level  j-1  where  another  iteration  (a.  1   from  X  .) 
leads  to  a  new  instance  of  level  j . 

5.7.  Cancellation  of  Non-Trees 

This  algorithm  ([Bellert  62],  [Chen  65],  [Maxwell  66],  [Piekarski  65]) 
is  actually  a  method  of  expanding  the  symbolic  determinant  mentioned  in  section 
5.2  [Myers  65]. 

The  flowchart  for  this  algorithm  appears  in  figure  2.   Box  2  reads 

"X.  <-   B.  -  {a,,  a_,  ....  a.  .  }"  which  means  that  X,   is  the  set  B,   (all 
J    3  1   2*       j-1  j  j 

branches  incident  to  node  v.)  excluding  any  currently  assigned  a    (i  = 

1,2 j-1).   Box  4  reads  "L  =  L  9   {{an,  a.,..., a  ,}}",  where  L  is  a  list 

1   Z  n— 1 

of  tree  candidates  which  have  been  generated  at  previous  instances  of  the  lowest 

level.   If   {an  ,  a_,  ...,  a   , }   is  equal  to  a  set   S  already  in  L,   then  S 
12        n-1 

is  removed  from  L.   {a.,  a_,  ...,  a  -}   is  added  to  L  if  and  only  if  there 

12        n-1 

is  no  such  match.   When  the  algorithm  terminates,   L  is  the  list  of  all  trees 
of  the  graph. 

5.8.  Circuit-Free  Expansion 

This  algorithm  ([Brownell  68],  [Char  68],  [Hobbs  59],  [Mason  57])  differs 
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Figure  2i   Cancellation  of  Non-Trees 
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from  the  previous  algorithm  by  avoiding  the  generation  of  non-treee  (rather 
than  waiting  to  cancel  them  out  of  the  list  L) .  This  algorithm  rejects  the 
choice  of  any  branch  which  forms  a  circuit  with  previously  chosen  branches . 
At  the  lowest  level,  the  tree  candidate  can  be  output  immediately  because  any 
set  of  n-1  branches  which  does  not  contain  a  circuit  is  a  tree. 

The  flowchart  for  this  algorithm  appears  in  figure  3.  Box  2  now  reads 
"X  «-  B  -  Circuit_Makers  (a..,  a2,  . ..,  aJ_i)"«   Box  ^  now  reads  "Output 

1*  32 '  "**•  an-l   * 

5.9.  Connected  Expansion 

This  algorithm  ([Berger  67],  [Cummins  64],  [Feussner  02,  04],  [Hirayama 
63],  [Minty  65],  [O'Neil  66a])  differs  from  the  previous  algorithm  in  the  method 
of  avoiding  non-trees.   Instead  of  testing  for  circuits,  this  algorithm  preserves 
connectedness.   The  references  cited  above  offer  a  variety  of  algorithms;  an 
efficient  representative  is  described  below. 

The  flowchart  for  this  algorithm  appears  in  figure  4.   The  new  variables 

are  Y.   (needed  to  avoid  duplications)  and  p    (representing  the  nodes  in  the 

current  connected  subgraph).   Box  1  has  added  the  initializing  statements 

"Y  +■  Branches  of  Graph"  and  "p  ■*■   v  ".   Box  2  now  reads  "X  «-  Boundary 

{p1 ,  p„,  ...,  p  }  n  Y  "  which  means  that  X   contains  all  branches  (in  Y . ) 

which  have  exactly  one  endpoint  belonging  to   {p  ,  p P-«^*   This  guarantees 

that  any  branch  picked  from  X   will  preserve  connectedness  and  avoid  circuits. 

Box  3  has  added  the  statement  "p..,  *■  other  endpoint  (a.)",  which  means  that  the 

rj+l        -  j 

endpoint  of  branch  a,   which  is  not  already  in   {p. ,  p„ p  }   is  assigned 

to  the  node  variable  P.+1 •   Also  in  box  3  are  the  statements  "remove  a   from 

Y  "  and  "Y..,  **■  Y  " ,  which  limit  the  choice  of  branches  at  lower  levels  (see 

3  J+l    J 

box  2)  in  order  to  avoid  duplications. 

5.10.  Factoring 

This  algorithm  ([Ardon  69],  [Chang  68],  [Chen  68,  69c],  [Cummins  64], 
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[Hirayama  65],  [Holt  68],  [Mason  57],  [Mcllroy  69],  [Nakagawa  58],  [Percival 

53])  differs  from  the  previous  algorithm  in  that  when  node  p   is  added  to 

the  currently  connected  subgraph,  all  branches  in  X.   which  are  incident  to 

p   are  factored  together  into  a  single  iteration.   As  a  consequence,  at  the 

lowest  levels,  instead  of  individual  trees  of  n-1  branches,  Cartesian  products 

of  n-1   factors  are  produced. 

The  flowchart  for  this  algorithm  appears  in  figure  5.   In  box  3,  "pick 

A   from  X  "  and  "p.  ,  *■  other  endpoint  (A.)"  mean  that   a.   is  picked  from 
j         j        J+1  J  J 

X   and   p   ,   is  the  other  endpoint  of  a.   (as  in  the  previous  algorithm), 
j        j+l  J 

However,   a.   is  now  extended  to  A.,   a  factor  set  of  branches:   A.  = 

J  J  J 

{a  }  u  X  n  B    ;  that  is,   A.   contains  all  branches  in  X.   which  are  incident 
j      3  Pj+1  J  2 

to   p   , .   All  of  A.   is  removed  from  X..   Similarly,  "remove  A.   from  Y." 
*3+l  J  J  J         J 

deletes  the  entire  subset  A.   from  Y.. 

J         J 

In  box  4,  "output  A  *  A„  *  . . .  *  A   "  means  that  a  family  of  trees  is 
output  in  the  form  of  a  Cartesian  product  of  the  factor  sets  A.,   1  <  j  <  n-1. 
This  factored  form  is  adequate  for  the  applications  (see  section  4.2),  but  if 
individual  trees  are  desired,  they  can  be  obtained  by  finding  all  combinations 
of  one  branch  from  each  of  the  n-1   factor  sets  (this  Cartesian  product  expan- 
sion could  be  accomplished  by  the  flowchart  in  figure  1,  with  box  2:   "X,  «-  A." 

and  box  4:   "Output  {an ,  a0,  ...,  a   ,  }") . 
r     1'   2'       n-1 

5.11.  More  Factoring 

The  idea  behind  this  algorithm  is  to  factor  into  a  single  iteration  (the 

last  one)  all  those  cases  in  which  only  one  branch  from  X.   appears  in  a 

tree.   To  avoid  duplication,  the  other  (earlier)  iterations  from  X   lead  to 

the  choice  (at  level   i+1)   of  an  additional  branch  of  X.  . 

3 

The  flowchart  for  this  algorithm  appears  in  figure  6.   The  new  variables 

are  d.   (a  truth  value  which  controls  X.)   and   Z.   (temporary  storage  for 

X  ).   Box  2  now  reads  "if  d.   then  Z.  ■*■  X.  «-  Boundary  {p.,  p„,  ...,  pJ}  n  Y, 
j  j         3  3  1   2       Kj    ■  j 
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else  X  «-X.  ,".  This  means  that  if  d .  =  YES,  then  X.   is  calculated 
J    0-1  J  J 

as  in  the  previous  algorithm,  and  stored  in   Z..   If  d   =  NO,  then  X.   is  as- 
signed the  current  value  of  X.  .   (since  one  branch  was  picked  from  X  ,   at 
level  j-1,   this  guarantees  another  branch  from  x._i   at  level  j). 

Box  3  has  the  additional  statements  "d,,.  <■  (-id,)   or   (X .  =  $)"  and 

J+l        J         J 

"if   d   &  d   ,   then  A.  «-  Z.".   Thus,  if  d,  =  NO   then  d...  •*■   YES.   If 
J    J+l         J    J  j  J+1 

d   =  YES   and  X.   is  not  empty,  then  d.in  «-  NO.   If  d.  =  YES   and  X.   is 
J  j  J+l  J  J 

empty,  then   d   -  *■  YES   and  A.   is  replaced  by   Z.   (the  saved  value  of  the 

full  set   X.   before  deletions). 

J 

5.12.  Pruning 

In  the  algorithms  of  sections  5.9  through  5.11,  branches  are  deleted  from 

the   Y.'s.   Even  though  the  input  graph  was  connected,  deleted  branches  may 

cause   YA.  =  Y.  u  (A.  x  A„  x  . . .  x  A   ,)   to  fail  to  connect  all  the  nodes 
J     J      1    2  n-1 

(denote  this  situation  by  "YA.   fails").   Once  YA.   fails,  further  computation 

J  J 

at  levels   j   through  n-1   is  wasted  because  no  spanning  trees  can  be  found  on 

a  disconnected  graph.   Thus  it  would  be  useful  to  know  when  YA.   fails.   On 

the  other  hand,  an  additional  connectedness  test  would  be  expensive  because  it 

would  be  executed  so  many  times. 

The  algorithm  of  this  section  differs  from  the  three  previous  algorithms 

in  that  needless  computation  is  avoided  when  YA.   fails,  but  an  additional 

J 

test  for  connectedness  is  not  needed.   This  is  accomplished  by  using  "failure 
to  find  trees"  as  a  test  for  connectedness. 

The  flowchart  for  this  algorithm  appears  in  figure  7.   The  new  variable 
is   k.   (the  count  of  executed  iterations  from  X  ).   Box  2  initializes  this 

•J  J 

count:   "k.  *■   0".   Box  3  increments  the  count   "k .  -«-  k.  +  1". 
J  J     J 

The  major  change  occurs  in  the   NO  branch  of  the  "j=l?"  decision  box 
where  a  further  test  is  inserted:   "k .  =  0?",  which  means  "was   X.   empty  in 
box  2?".   If  the  answer  is   NO  ,  then  computation  proceeds  (as  in  previous 
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flowcharts)  to  box  5.   If  the  answer  is  YES,  then  it  is  known  that  YA 

J 

fails,  and  computation  proceeds  to  box  6  which  reads  "j  +•  j-1  until  I     >   1". 
This  means  that  control  returns  to  the  previous  level   (j  ■*■  j-1)   and  continues 
to  return  to  higher  levels  (pruning  unnecessary  iterations)  until  I.    >  1.   The 
remaining  iterations  at  level  j   are  then  pruned  by  proceeding  to  box  5 

(j  -  J-D. 

The  idea  behind  box  6  is  that  the  first  iteration  of  box  3  does  not  change 

the  value  of  YA.    from  that  of  YA.   .   Thus  the  final  value  of  j   leaving 

box  6  indicates  the  highest  level  at  which  YA.   failed. 

J 

5.13.  Variations 

There  are  many  variations  of  the  algorithms  of  sections  5.7  through  5.12. 
Some  will  be  mentioned  very  briefly  in  this  section. 

There  is  an  algorithm,  Circuit  Check,  which  is  "half  way"  between  Cancella- 
tion of  Non-Trees  [5.7]  and  Circuit-Free  Expansion  [5.8].   This  algorithm  is  use- 
ful for  analysis  and  will  be  described  in  section  6.4. 

The  algorithms  of  sections  5.7  and  5.8  can  be  generalized  [Maxwell  66] 
by  replacing   B.   with  a  somewhat  more  general  cutset. 

Sections  5.7  and  5.8  can  be  improved  by  labeling  the  nodes  so  that  degree 
(v.)  <  degree  (v   ,  ) .   For  sections  5.9  through  5.12,  a  good  heuristic  is  to 

always  choose   p.   to  be  of  minimal  degree. 

J 

For  graphs  with  b  <  2(n-l),   it  may  pay  to  find  all  co-trees  (using  some 
form  of  duality)  and  convert  them  to  trees. 

Finally  there  are  many  special  cases  which  can  occur  in  graphs  (either 
initially  or  during  computation)  which  can  be  handled  more  efficiently  than  the 
general  case.   For  example,  the  existence  of  separating  nodes  or  separating 
branches  allows  a  quick  decomposition. 
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6.    ANALYTICAL  MEASUREMENTS  OF  SELECTED  ALGORITHMS 

The  algorithms  described  in  chapter  5  will  now  be  measured  by  the  tech- 
niques described  in  section  3.3.   A  priori  bounds  (which  do  not  depend  on  the 
structure  of  an  algorithm)  are  given  in  section  6.1.   An  example  of  worst  case 
analysis  appears  in  section  6.2.   In  the  remaining  sections,  only  the  expansion 
algorithms  [5.6  through  5.12]  are  analyzed,  using  the  "computation  tree"  defined 
in  section  6.3.   Section  6.4  employs  direct  comparisons  of  consecutive  algorithms, 
Section  6.5  introduces  the  "quotient  operator"  and  uses  it  to  measure  the  New 
algorithm  on  "closed  ladder"  graphs.   Finally,  section  6.6  applies  "complete 
graph  analysis"  in  order  to  obtain  upper  bounds  for  the  factoring  algorithms. 

6.1.  A  Priori  Bounds 

Sometimes  it  is  possible  to  derive  bounds  for  an  algorithm  without  knowing 
its  structure.  If  an  algorithm  is  difficult  to  analyze,  a  priori  bounds  may  be 
the  tightest  available  bounds. 

Find-all-trees  algorithms  illustrate  this  point  because  the  required 
number  of  answers,   t,   grows  exponentially.   Any  algorithm  which  finds  trees 
one  at  a  time  must  have   t  =  0(c)   [recall  from  section  1.3  that  "f(n,b,t)  = 
0(c)"  means   4A  such  that   c  >  A  •  f(n,b,t)].   For  the  algorithms  of  sections 
5.4,  5.5,  5.8,  and  5.9,  tighter  bounds  are  difficult  to  obtain. 

Another  example  of  "a  priori"  bounding  occurs  in  the  exhaustion  algorithms 

u 

(section  5.1).   The  first  algorithm  checks  all   (    )   combinations  of  n-1 

n-I 

b! 

branches,  so  regardless  of  the  details,  -r. ■  v,-> — rrv  =  0(c).   The  second 

°  (b-n+1) !  (n-1) ! 

n-2 
algorithm  checks  each  of  the   n     [Cayley  89]  trees  of  the  complete  graph, 

so  n    =  0(c).   The  storage  required  by  the  second  algorithm  is  also  larger 

o 
than   n    .   These  lower  bounds  are  sufficient  to  demonstrate  the  inefficiency 

of  these  algorithms. 
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6.2.  Worst  Case 

This  section  will  illustrate  "worst  case  analysis"  as  applied  to  the  first 

exhaustion  algorithm  [5.1].   Let  the  branches  of  the  graph  be  numbered  1  through 

b.   Represent  each  combination  of  branches  by  an  ordered  list  of  n-1  positions, 

p.,   each  position  containing  a  branch  number  (p.  -  i,  where  l<i^b).   In  order 

to  avoid  duplications,  require  that   p   <  p,   for  j<k.   In  order  for  this 

condition  to  be  satisfied  for  all  j   and  k,   each  p.   must  be  restricted  to 

J 

values  from  a  set  of  b-n+2  branch  numbers,  namely  j<p.^b-n+l+j   (l<j<n-l). 
Assume  that  the  algorithm  is  equivalent  to  n-1   nested  iteration  loops, 
each  loop  corresponding  to  a  position  in  the  ordered  list.   Since  each  loop 
goes  through  a  maximum  of  b-n+2   iterations,  the  code  in  the  innermost  loop 
is  executed  no  more  than   (b-n+2)     times.   If  that  code  (a  tree  test)  costs 
0(n),   then   c  =  0(n« (b-n+2 )n-1) . 

6.3.  Computation  Trees 

A  computation  tree,   CT(G,  A),   traces  the  execution  of  algorithm  A 
applied  to  graph   G.   The  expansion  algorithms  have  computation  trees  of  height 
n,   as  shown  in  figure  8.   The  meaning  of  CT(G,  A)   and  the  definition  of  its 
size  parameters   c.(G,  A),   are  as  follows. 

The  nodes  of  CT(G,  A)   are  arranged  in  n   levels,  directly  corresponding 
to  the   n   levels  of  the  flowchart  of  A.   At  the  bottom  (i.e.,  level  n)   of 
CT(G,  A),   there  are   c,(G,  A)   nodes,  each  corresponding  to  an  execution  of 
box  4  ("process   {a,,  a„,  ...,  a    }")  of  the  flowchart  of  A.   Each  of  the 
other   c„(G,  A)   nodes  corresponds  to  an  execution  of  box  2  ("compute  X."). 
There  are   c  (G,  A)   branches  in  CT(G,  A),   each  connecting  a  node  on  level 
j   to  one  on  level  j+1   (corresponding  to  an  execution  of  box  3:   "pick  a. 
from  X.").   Since  the  number  of  nodes  equals  the  one  plus  number  of  branches, 
c2(G,  A)  +  c,(G,  A)  =  1  +  c  (G,  A).   The  arguments   G  and  A  may  be  modified 
or  dropped  when  no  confusion  would  result. 
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The  cost  of  algorithm  A  applied  to  G  can  be  expressed  in  terms  of 
these  size  parameters.   Using  the  last  equality  and  ignoring  very  small  costs, 
c  =  c  (G,  A)[cost(box  2)  +  cost(box  3)]  +  c  (G,  A)[cost(box  4)  +  cost(box  3)]. 

Using  this  formula,  a  worst  case  bound  for  the  Cancellation  of  Non  Trees 
(CNT)  algorithm  of  section  5.7  can  be  derived  as  follows.   First,  cost  (box  4)  = 
cost  (compare  two  sets)  •  (//  of  sets  in  L) .   Intuitively  and  in  practice, 
(//  of  sets  in  L)  =  0(t).   Since  cost  (compare)  =  0(n),  cost  (box  4)  =  0(nt). 
Since  this  dominates  the  costs  of  the  other  boxes,  only   c,(G,  CNT)   needs  to 

be  calculated.   Now,  from  the  flowchart  (figure  2),   X.  c  B..   Therefore,  in 

J  ~  J 

CT(G,  CNT)   each  node  at  level  j   leads  to  (at  most)  degree  (v.)  nodes  at 

n-1  J 

level   j+1.   Thus   c,(G,  CNT)  <   II   degree  (v.).   To  express   c,   in  terms  of 

J=l 
n   and   b,   first  use  the  "geometric  vs.  arithmetic  mean"  inequality  [i.e., 

a,  •  x_  . .  .  x   <  —  (x1  +  x„  +  .  .  .  +  x  )  1 ,  to  obtain 

12      n   n   1    2  n 

1   n_1  n-1 

c  (G,CNT)  <  ( — -  £   degree   (v.))    .   Assuming  degree  (v  )  >  degree  (v.) 
4  n— x  A—l  J  J 

[the  most  efficient  case],   c.(G.CNT)  <  (— )n_1.   Thus   c  =  0(nt(— )n_1). 

4  n  n 

In  the  analysis  to  follow,  it  will  be  convenient  to  use  the  notation 
ST(G,  A,  w)   to  denote  the  subtree  of   CT(G,  A)   consisting  of  the  node  w 
[w  e  CT(G,  A)]   and  all  the  nodes  and  branches  connected  to  w   from  below. 
If  w   is  the  root  node,   ST(G,  A,  w  )  =  CT(G,  A).   Two  subtrees, 
ST(G  ,  A  ,  w  )   and   ST(G„,  A  ,  w„),   are  isomorphic  if  there  is  a  one-to-one 
and  onto  mapping  of  the  nodes  and  branches  which  preserves  incidence  and  level 
relationships . 

6.4.  Direct  Comparisons 

The  expansion  algorithms  (5.7  through  5.12)  will  now  be  sequentially 
compared  in  terms  of  their  computation  trees.   The  expression  "c . (A  )  <  c.(A.)" 
means  that  for  all  graphs   G,   and  for  each  size  parameter   c.   (i  =  2,  4), 
c.(G,  Ax)  <  c±(G,    A2). 
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The  Circuit-Free  Expansion  (CFE)  algorithm  [5.8]  introduces  two  changes 
from  the  previous  algorithm  (CNT) .   First,  the  "non-tree  test"  changes  from 
"check  L   for  duplicates"  to  "check  for  circuits".   Second,  this  test  has  been 
moved  up  from  box  4  to  box  2.   In  order  to  isolate  the  change  in  efficiency,  let 
us  make  these  changes  one  at  a  time. 

If  the  second  change  is  made  without  the  first  [Piekarski  65],  the  cost 
increases  [based  on  limited  empirical  evidence];  therefore,  let  us  try  the 
first  change  without  the  second.   Call  this  the  Circuit  Check  algorithm  (CtC) . 
Since  the  only  change  is  in  box  4  ("check   {a..  ,  a„,  ...,  a    }   for  circuits"), 
the  computation  tree  does  not  change:   c.(CtC)  =  c.(CNT).   However,  cost  (box  4) 
drops  from  O(n.t)   to  0(n)   [the  cost  of  a  circuit  test].   Clearly,  Circuit 
Check  is  more  efficient  than  Cancellation  of  Non-Trees. 

Now  add  the  second  change.   The  cost  of  each  box  is  0(n)   regardless  of 
the  placement  of  the  circuit  test  (only  the  constants  change).   However 
c . (CFE)  <  c.(CtC)   because  the  non-trees  are  discovered  sooner,  and  needless 
computation  is  avoided.   For  nearly  all  graphs  G,   c.(G,  CFE)  <  c.(G,  CtC). 
Thus  the  second  change  is  an  improvement  also. 

The  Connected  Expansion  (Con)  algorithm  [5.9]  will  be  considered  equally 
efficient  as  Circuit-Free  Expansion.   Both  algorithms  find  trees  one  at  a  time, 
avoiding  non-trees  and  duplications,  so   c, (CFE)  =  c.(Con)  =  t.   Empirically, 
Circuit-Free  Expansion  appears  more  efficient  [Fernandez  69a]. 

The  Factoring  (Fac)  algorithm  [5.10]  is  clearly  an  improvement  over  "one 

12        k 
tree  at  a  time"  algorithms.   Each  factor  A.  =  {a.,  a.,  ....  a.}   (box  3, 

J     J    J        J 

figure  5),  corresponds  to  a  node  w.  ,_   in  CT(G,  Fac);   i.e.,   A.   corresponds 

to  the  entire  subtree   ST(G,  Fac,  w,,.),   Each   a.   corresponds  to  a  node  w.,, 

J+1  J  J+l 

in   CT(G,  Con).   Therefore,   ST(G,  Fac,  w   .)   replaces   k  subtrees 
ST(G,  Con,  w1+1),   i  =  1,  ...,  k.   Clearly  c.(G,  Fac)  <  c.(G,  Con),   with 
equality  holding  only  if  G  is  a  tree.   Typically,   c.(G,  Fac)/t   (the  "cost  per 


3h 

tree")  goes  to  zero  exponentially  as  n  increases  [6.6,  figure  10]. 

For  each  X.   calculated  in  box  2  of  the  Factoring  algorithm,  the  trees 

which  contain  iust  one  branch  from  X.   will  be  calculated  k  times  over, 

J  ' 

where  k  is  the  number  of  iterations  necessary  to  empty  X..   The  More 
Factoring  (MF)  algorithm  [5.11]  combines  these  k  cases  into  a  single  iteration, 
clearly  an  improvement  in  efficiency.   Thus   c.(G,  MF)  <  c.(G,  Fac) ,  with 
equality  holding  rarely.   Typically,   c . (MF) /c . (Fac)   goes  to  zero  exponentially 
as   n   increases  [6.6].   An  intuitive  indication  of  the  improvement  is  that  the 
factors  are  larger;  i.e.,  on  a  complete  graph,  Factoring  will  always  find  at 
least  one  Cartesian  product  family  consisting  of  a  single  tree  [Chang  68],  while 
(if   n>3)   every  family  found  by  More  Factoring  will  contain  at  least  two  trees. 

The  Pruning  algorithm  [5.12]  is  clearly  an  improvement.   The  test  for 
correctedness  is  obtained  at  negligible  cost,  but  the  potential  savings  are 
large.   Naming  this  algorithm  (with  factoring  and  pruning,  [figure  7])  the 
New  algorithm,   c.(G,  New)  <  c.(G,  Fac). 

6.5.   Special  Graphs:   The  Quotient  Operator 

This  section  (as  well  as  the  next)  will  illustrate  the  technique  of 
measuring  the  cost  of  an  algorithm  on  a  parameterized  class  of  graphs, 
G  =  G(p).   For  the  algorithms  to  be  measured,   c  =  k  c„(G)  +  k,  c,(G)   with 
k.  =  0(n);   thus,  the  only  quantities  which  need  to  be  measured  are   c  (G) 
and   c  ,(G)  .   Since  G  =  G(p),   c.(p)   will  replace   c.(G).   Only  algorithms 
with  factoring  will  be  measured  directly;  the  "one  at  a  time"  algorithms  [5.4, 
5.8,  5.9]  have   c/(p)  =  c(p)»   the  number  of  trees  as  a  function  of  p. 

For  the  classes  of  graphs  to  be  considered,   c~(p),  c,(p),   and   t(p) 
are  all  exponential  in   p.   In  order  to  derive,  compare,  and  plot  these  func- 
tions,  f(p),   the  "quotient  operator",   Q(f,  p),   will  be  used:   Q(f,  1)  = 
f(l),   and  for   p>l,   Q(f,  p)  =  f(p)/f(p-l)   [it  is  not  difficult  to  interpret 
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P 
f(l)  >  1,   thus  Q(f,  2)   is  well  defined].   Clearly,   f(p)  =  n  Q(f,  i) . 

i=l 
For  the  functions  to  be  considered,  there  will  always  exist  a  "quotient  limit", 

q(f,  p) ,   either  a  constant  or  a  linear  function  of  p,   such  that 

lim  ^IxJSl   =  i.   For  example,   q(p!,  p)  =  p,   q(kP,  p)  =  k,   q(kP,  k)  =  0. 
p-*»  q(f,  p) 

As  a  first  example  of  a  special  class  of  graphs,  consider  L(r),   the 
closed  ladder  of   r   rungs  (see  figure  9).   Since  n(r)  =  2r   and  b(r)  =  3r-2, 
this  example  will  show  that  even  on  graphs  with  rank  >  nullity  [i.e., 
b  <  2(n-l)],   the  New  algorithm  has  cost  per  tree,   c/t,   going  to  zero 
exponentially.   To  prove  this  claim,  it  suffices  to  show  that   q(c.(r,  New),r)/ 
q(t,  r)  <  1. 

It  is  not  difficult  to  derive  the  equation   t(r)  =  4  t(r-l)  -  t(r-2), 
with   t(l)  =  1,  t(2)  =  4.   In  fact,   t(r)  =  t(i+l)  •  t(r-i)  -  t(i)  •  t(r-i-l), 
for  any   i,   l<i<r-2.   From  this  equation  it  is  not  difficult  to  show  that 
q(t,  r)  =  2  +  /3. 

The  New  algorithm  on  L(r)   starts  at   p  ,   one  of  the  four  corners 
(nodes  of  degree  2).   There  will  be  two  iterations  at  the  first  level  of  the 
computation  tree   CT(r)   [see  figure  9]. 

The  first  iteration  handles  the  case  in  which  both  branches  incident  to 
p  ,   the  starting  node,  are  included  in  a  tree.   Thus  at  level  3,  the  three 
branches  in  X    [see  figure  9]  will  reach  only  two  nodes.   Because  of  factoring, 
the  remaining  computation  from  X_   is  just  like  starting  at  a  corner  of  L(r-l). 
If  w   is  the  node  in  CT(r)   which  corresponds  to  the  calculation  of   X„, 
then   ST(r,  w  )   is  isomorphic  to   CT(r-l). 

The  second  iteration  at  level  1  handles  the  case  in  which  only  one  of  the 
branches  incident  to  p1   will  be  included  in  a  tree.   Assuming  p„   is  chosen 
to  be  the  corner  adjacent  to   p  ,   then  there  is  only  one  branch  in  X  ,   and 
this  leads  to  a  corner  of  L(r-l)   [see  figure  9].   Once  again,  if  w'   is  the 
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de  of  CT(r)   immediately  below  this  iteration  of  X  ,   then  ST(r,  w')   is 
isomorphic  to   CT(r). 

From  the  above  information,  it  is  easy  to  derive  the  relationships 
c4(r)  =  2  •  c4(r-l),  c2(r)  =  2  •  c2(r-l)  +  3,   and  c4(l)  =  c2(l)  -  1. 
Applying  the  quotient  operator  and  taking  limits  yields  q(c.,  r)  =  2.   The 
claim  is  proved  and  c.(r,  New)  =  0(2  ). 

6.6.  Complete  Graphs 

This  section  will  measure  the  computation  tree  of  algorithms  with 
factoring  [5.10,  5.11]  applied  to  the  complete  graph  on  n  nodes   (G  =  S(n)). 
This  test  is  significant  as  an  "upper  bound";  that  is,  for  all  graphs   G  on 
n  nodes,   c.(G,  A)  <  c.(S(n),  A).   Because  of  factoring,  this  inequality  remains 
true  even  if   G  has  parallel  branches.   For  non-factoring  algorithms,  the  state- 
ment would  have  to  be  modified  to  exclude  parallel  branches. 

CT(S(n),  Fac)   has   n-1   iterations  (branches)  at  the  highest  level 
because  in   S(n) ,   the  starting  node   p1   is  adjacent  to  each  of  the  n-1   other 
nodes.   As  these  iterations  are  executed,   Y    [the  set  of  available  branches  in 
S(n)]  will  be  diminished  only  by  branches  incident  to   p..   [see  figure  5]. 
Therefore,   Y   will  always  contain  a  complete  graph  on  the  n-1   nodes  excluding 
p1 .   This  completeness  implies  that  for  each  node  w  at  the  second  level  of 
CT(n,  Fac),   ST(n,  Fac,  w)   is  isomorphic  to  CT(n-l,  Fac). 

From  this  information,  it  is  easy  to  derive  the  following  relationships: 

c4(n,  Fac)  =  (n-1)  c4(n-l,  Fac);  c2(n,  Fac)  =  (n-1)  c  (n-1 ,  Fac)  +  1; 

c2(l)  =  c2(2)  =  c4(l)  =  c4(2)  =  1.   Obviously,   c4(n,  Fac)  =  (n-1)!,   so 

q(c4,  n)  =  n-1.   Q(c2,  n)  =  [  (n-l)c2(n-l)  +  l]/c2(n-l)  =  n-1  +  l/c2(n-l). 

lim  n-1  +  l/c2(n-l) 
Since      : =  1,   q(c_,  n)  =  n-1.   Thus  the  quotient  limits 

for   c2   and   c4   are  equal.   The  only  effect  of  the  "+1"  term  in  the  recursive 

lim  c2^ 

formula    for      c„      is    that  r-r-  =   e. 

2  n-+°°  c.  (n)  w 
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The  complete  graph  analysis  of  the  New  algorithm  is  slightly  more 
complicated.   In  box  2  [figure  6],  if  d.  =  YES,   then  X.   is  calculated  as 
in  the  previous  Factoring  algorithm  [figure  5],   The  analysis  of  this  (d.  =  YES) 
case  is  similar  to  the  analysis  of  the  Factoring  algorithm;  if  w.   is  the 
corresponding  node  in  CT(n,  New),   then   ST(n,  New,  w  )   is  isomorphic  to 
CT(n+l-j,  New). 

Since   d   =  YES,   X   will  reach  each  of  the   n-1   nodes  adjacent  to  p  , 
the  starting  node.   Let  w   be  the  root  of   CT(n,  New)   and  let   w„   (i  =  1, 
...,  n-1)   be  the  nodes  at  level  2.   For   w~   ,   d_  =  YES,   so   ST(n,  New,  w        ) 
is  isomorphic  to  CT(n-l,  New).   For   l<i<n-2,   let   w  '      be  the  nodes  at 
level  3  directly  connected  to  w  .   For  w„   (l<i<n-2) ,   d   =  NO,   but  for 
i,k(i)^     =  yES:   sQ   ST(nf  New^   i,k(i)^   is  isomorphic  to  cT(n-2,  New). 

n-2 

v  i 

To  compute    I      k(i) ,   note  that   X„   (corresponding  to  w„)   is  assigned  [box 

i=l 

2,  figure  6]  the  current  (depleted)  value  of   X  .   Thus  there  will  be  as  many 

iterations  of   X„   [k(i)]   as  there  are  remaining  iterations  of  X..   [n-l-i]. 

n-2  n-2 

Thus    [   k(i)  =    I      n-l-i  =  (n-2) (n-1) /2 . 
i=l  i=l 

From  this  information,  it  is  easy  to  derive  recursive  formulas  for 

c.(n,  New):   c^n)  =  c4(n-l)  +  iH=22iSlil  ^(n-2);  c2(n)  =  c2(n-l)  + 

(n-2)(n-l)  c^(n_2)    +  n_1;  c^(1)    m    c^(2)    =    c^(1)  =  c^(2)  =  1;  C2(3)  m    3# 

Applying  the  quotient  operator  to  these  recursive  equations  yields 

Q(c4,  n)  -  1  +  2?Q(c(nn-l)  '   T°  Sh°W  that   q(c4'  n)  =  (n~1)//2»   let 


m  = 


lim  nt  s        J?-  „     u       •         -     n,  ,    /2    /2 

0(c.  ,  n)  •  — -    .   From  the  previous  equation,   Q(c.  ,  n)  •  — =-  =  — -  + 
n-*-°°    4       n-1  ^         M  4'      n-1   n-1 

— — ■ ' —  .   Taking  the  limit  as   n-*°°,   m  =  OH ;   i.e.,   m  =  1,   so  by 

/2  Q(cA,  n-1  m 

definition,   q(c4,  n)  =  (n-l)//l.   The  proof  that   q(c9,  n)  =  (n-l)//2   is 
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similar  (an  additional  term  goes  to  zero). 


The  quotient  limit  for  the  Circuit  Check  algorithm  [6.4]  can  be  derived 

2b  n_1 
from  the  fact  that  for   S(n),  b  =  n(n-l)/2.   Thus   cA(S(n),  CtC)  »  (— ) 

,  ,   n xn-l         ,  n-2  1  n-2 

.   , >n-l   _/     x    (n-1;      /  ,yn-lN  lim  ^n-iv 

(n-1)    .   Q(c4,  n)  =— — 2  =(n-lX^2)    .   Since  n_(^) 

(n-2) 

q(c^,  n)  =  (n-l)e. 

n— 7 

The  quotient  limit  for   t(n)  =  n     is  similarly  derived:   Q(t,  n)  = 

n-1 

n"2  1         n"1     Hn,  (n"2+  n>(^T> 

n        /  o  ■  1\  /  n  N      iim       n  n-i       ..    ,    , 

=  (n-2+  -)  (— -7)    ;   _„  .      0,  -  1;   therefore, 


.   nXn-3  n  n-1     '   n^-°°      (n-2)e 

(n-1) 

q(t,n)  =  (n-2)e. 

Figure  10  plots   c,(n,  CtC),   t  =  c,(n,  CFE)  =  c,(n,  Con),   c.(n,  Fac)  , 
and   c.(n,  New)   using  the  quotient  operator  Q(f,  n) .   The  quotient  limits 
derived  above  are  the  asymptotic  limits  of  the  plotted  functions. 

From  this  analysis,  it  is  obvious  that  the  cost  per  tree,   c/t,   goes  to 

zero  exponentially   (e   )   for  the  Factoring  algorithm,  and  the  cost  ratio  of 

-n 
the  New  algorithm  to  Factoring  also  goes  to  zero  exponentially   (/2   ).   Thus, 

the  most  efficient  way  to  find  trees  one  at  a  time  is  to  use  the  New  algorithm 

combined  with  a  simple  Cartesian  product  expansion  algorithm  [cost  (New)  <  cost 

(simple  expansion)  <  cost  (other  "one  at  a  time"  algorithms)]. 
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Figure  10:   Complete  Graph  Analysis 
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7.     CONCLUSIONS 

One  important  contribution  of  this  thesis  is  the  efficiency  analysis  of 
all  published  algorithms  for  finding  all  trees  of  a  graph.   A  very  rough  summary 
of  this  analysis  is  as  follows: 

Algorithm  Cost 

2 
"check  for  duplications"         t 

"one  tree  at  a  time"  t 

~n 
Factoring  te 

New  t(e/2)~n 

Note  that  for  the  algorithms  with  factoring,  the  cost  per  tree  goes  to  zero 

exponentially  as  n  increases. 

The  techniques  which  were  used  to  measure  efficiency  include  the 
following:   (1)  the  use  of  special  classes  of  graphs  on  which  the  cost  of  an 
algorithm  can  be  accurately  measured  (e.g.,  complete  graphs);   (2)  the  direct 
comparison  (e.g.,  using  computation  trees)  of  competing  algorithms  in  order  to 
show  differences  in  efficiency  without  the  need  to  derive  individual  bounds; 
(3)  the  isolation  of  each  idea  of  an  algorithm  (e.g.,  factoring)  so  that  the 
efficient  ideas  can  be  available  for  the  development  and  analysis  of  new 
algorithms;   (4)  the  minimization  of  implementation  details  in  empirical 
measurements  (e.g.,  using  GASP  and  counting  statements  rather  than  seconds); 
(5)  the  use  of  measures  which  reflect  the  nature  of  the  class  of  algorithms 
(e.g.,  the  quotient  operator  which  linearizes  the  exponential  nature  of  "recur- 
sive" algorithms). 

The  New  algorithm  is  an  important  contribution  of  this  thesis  primarily 
because  these  techniques  show  that  it  is  more  efficient  than  any  previous 
algorithm  for  finding  all  trees. 
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APPENDICES 

1.    GASP  MANUAL 

1.1.  Purposes  of  GASP 

The  main  purpose  the  the  Graph  Algorithm  Software  Package  is  to  allow 
programmers  of  graph  algorithms  to  code  programs  in  a  natural  and  machine 
independent  way.   Because  operations  are  expressed  in  a  language  of  graph  and 
set  terms,  the  programs  will  be  easy  to  follow  and  estimates  of  the  amount  of 
computation  will  be  easier  to  compute.   Comparisons  among  different  algorithms 
for  the  same  problem  will  be  much  easier  using  GASP  becuase  it  is  easy  to 
generate  programs  from  their  description  in  conventional  graph-theoretic  terms. 
Moreover,  some  tallies  on  the  amount  of  computation  are  provided  by  GASP. 

The  logical  structure  of  GASP  makes  it  possible  to  change  representations 
with  relatively  little  change  in  programs.   A  few  low-level  routines  would  have 
to  be  rewritten,  but  the  higher  level  programs  would  not. 

1.2.  Basic  Concepts  and  Terminology 

1.2.1.  Data  Types 

An  integer  has  the  usual  definition.   A  character  string  is  a  fixed-length 
string  of  characters.   A  truth  value  is  a  variable  which  can  take  on  one  of  the 
two  values:   yes  or  no.   A  name  references  an  object  (see  below).   An  object  is 
a  conglomeration  consisting  of  one  integer,  one  character  string,  three  names, 
and  (most  important)  one  set .   A  set  can  exist  only  as  part  of  an  object.   There 
are  two  types  of  objects:   restricted  and  unrestricted.   Restricted  objects  can- 
not belong  to  sets. 

1.2.2.  Definitions  and  Assignments 
GASP  objects  are  available  to  the  programmer  only  through  one  level  of 
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indirect  addressing.  The  programmer  deals  with  names  which  refer  to  objects 
which  have  values.  This  relationship  (shown  in  figure  11)  is  very  important 
for  the  understanding  of  GASP. 


name 


definition 


assignment 


values 
(for  set,  etc.) 


Figure  11 

In  order  to  distinguish  one  level  from  the  other,  we  will  use  two  sets 
of  words.   A  name  is  defined  if  it  references  an  existing  object,  otherwise  it 
is  undefined.   A  name  may  change  its  definition;  that  is,  it  can  be  made  to 
refer  to  a  different  object.   The  number  of  names  referring  to  an  object  may 
vary  from  zero  to  any  reasonable  positive  integer. 

When  the  contents  of  a  set  are  changed,  we  say  the  set  is  assigned  a 
new  value.   Also,  the  object  involved  is  assigned  a  new  value. 

The  main  advantage  of  this  indirect  addressing  scheme  is  that  not  all 
objects  need  to  be  accessed  by  permanent  names.   E.g.,  all  objects  in  a  set 
S  may  be  made  accessible  by  means  of  the  statement  "FOR  (ALL,  X,  S)",  even  if 
no  object  in   S   has  previously  been  given  a  name.   In  this  case,  we  regard 
the  bound  variable  X  as  a  name  whose  definition  ranges  over  all  the  objects 
in  the  set   S. 


1.2.3.   Graphs 

A  graph  is  represented  as  an  object  whose  set  contains  the  branches  and 
nodes  belonging  to  the  graph.   Each  node  and  each  branch  is  an  unrestricted 
object.   In  this  first  implementation,  the  set  associated  with  a  branch  is  the 
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set  of  two  incident  nodes;  the  set  associated  with  a  node  is  the  set  of  all 
incident  branches  and  adjacent  nodes. 

1.2. A.   System  Objects 

GASP  reserves  a  few  restricted  objects  for  special  use.   NULLSET  is  a 
read-only  object  whose  set  is  always  empty.   AC  is  an  object  whose  set  holds 
intermediate  values  of  set  operations.   USED  is  the  object  whose  set  contains 
all  currently  active  unrestricted  objects.   NODES  and  BRANCHES  are  the  objects 
whose  sets  contain  all  nodes  and  branches  (respectively)  belonging  to  the 
union  of  all  graphs. 

1.2.5.   GASP  Statement  Forms 

GASP  is  an  extension  of  PL/1  (through  the  use  of  the  PL/1  Preprocessor). 
Thus  any  PL/1  statement  could  be  considered  a  GASP  statement.   The  'pure'  GASP 
statements  fall  into  three  categories:   PL/1  statements  which  declare  or  assign 
values  to  GASP  data  types;  GASP  Procedure  calls  which  constitute  a  complete 
statement  starting  with  CALL  (unless  the  procedure  name  begins  with  $)  and 
ending  with  a  semi-colon;  Type- functions  where  type  is  one  of  the  following: 
name,  integer,  truth  value.   A  type-function  call  can  be  inserted  almost  any- 
where a  'type'  variable  is  allowed. 

1.3.   The  GASP  Statements  for  Sets 

1.3.1.   Notation 

In  the  instruction  set  that  follows,  the  actual  code  that  must  appear 
(as  spelled)  is  capitalized  while  arbitrary  names  used  as  arguments  are  not. 
GASP  statements  will  frequently  be  set  off  by  quotation  marks  which  are  ob- 
viously not  part  of  the  code. 
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1.3.2.  Declarations 

GASP  variables  are  declared  just  as  regular  PL/1  variables  are  declared. 
Conversion  from  terms  in  section  2  to  actual  program  words  is  shown  below. 
Formal  Description  Computer  Code 

name  (and  unrestricted  object)  $NAME 

integer  $INTEGER 

character  string  $CHAR 

truth  value  $BIT 

restricted  object  $MAXSET 

NOTE:   Declaring  a  name  (or  unrestricted  object)  does  not  define  it.   However, 
'DCL  x  $MAXSET;'  will  create  a  restricted  object  and  that  object  will  be  the 
definition  of  x.   $MAXSET  is  the  only  declaration  which  cannot  be  factored; 
'DCL  (SI,  S2)  $MAXSET;'  would  result  in  both  names   SI  and   S2  referring  to 
the  same  object. 

1.3.3.  Definitions 

A  name,   x,   can  be  defined  in  two  other  ways  besides  'DCL  x  $MAXSET; ' 
[previous  paragraph] . 

' $ALLOC  (x)  ;'  creates  (storage  for)  a  new  unrestricted  object  which 
will  become  the  definition  of  the  name  x   (previously  declared  $NAME) .   Also 
the  character  string  part  of  this  object  will  be  assigned  the  value  'x1. 

'x  =  name-expression  ; '  will  define  (or  redefine)  the  name  x  to  refer 
to  the  object  named  by  name-expression  (name-expression  can  be  either  a  pre- 
viously defined  name  or  an  arbitrary  expression  which  computes  a  name  value) . 

1.3.4.  Freeing  of  Storage 

The  storage  taken  up  by  an  object  can  be  freed  as  follows: 
'$KILL  (x)  ;'  will  free  the  unrestricted  object  named  x  and  the  name 
x  will  become  undefined. 
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'CALL  POP  (x)  ;'  will  free  the  restricted  object  named  x  and  leave 
x  undefined. 

1.3.5.   Operations  on  Sets 

1.3.5.1.  Notation 

Nearly  all  arguments  will  be  defined  names.   In  the  following  examples, 
those  names  beginning  with  's'  are  to  be  considered  as  sets  (of  any  object) 
and  those  beginning  with  'e'  should  be  considered  as  elements  of  sets  ('e1 
names  must  refer  to  unrestricted  objects).   As  is  the  case  with  most  GASP 
operations,  no  names  are  changed  by  the  instructions  in  this  section. 

1.3.5.2.  Truth  Value  Functions 

'$IS-IN(e,  s)'  answers  "does  e  belong  to   s?". 
'$EQUALS(sl,  s2)'  answers  "does  set   si  =  set  s2?". 
'$EMPTY_(s)'  is  equivalent  to  '$EQUALS(s,  NULLS ET) ' . 

1.3.5.3.  Procedure  Calls 

'$STORES  (sname,  s_expression)  ; '  will  assign  to  the  set  named  sname 
the  value  of  the  set  named  s_expression  (which  remains  unchanged). 

'$CLEAR(s);f  is  equivalent  to  '$STORES  (s,  NULLSET)  ;'. 

'$CHANGES  (s,  elem,  op)',  where  op  =  ADD  or  DELETE,  will  add  (delete) 
elem  to  (from)  the  set  s.  If  this  does  not  change  the  truth  value  of  the 
expression  "elem  e   s",  then  it  is  a  harmless  waste  of  time. 

'$CSES  (s,  e)  ;'  (Clear  and  Store  Element  in  Set)  assigns  to   s   the 
value   {e}. 

1.3.5.4.  Integer  Functions 

'CARD(s)'  returns  the  integer  number  of  elements  belonging  to   s. 

1.3.5.5.  Name  Functions  (Choice) 

The  functions  in  this  section  pick  elements  out  of  sets  with  varying 
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side  effects. 

'ELEM_OF(s) '  will  return  the  name  of  an  object  belonging  to  the  set   s. 
This  statement  should  not  be  used  unless  it  is  known  that   s   is  not  empty. 
The  set  s   is  unchanged  by  this  instruction. 

*CAN_PIC  (e,  s)'  is  a  truth  value  function  which  will  answer  "CAN  one 
PICk  an  element  from  s?".   If  set  s   is  not  empty,   e  will  name  an  object 
belonging  to   s,   which  will  then  be  deleted  from  s.   If   s   is  empty,   e 
will  be  undefined. 

'ITH_EL  (s,  i)'  will  return  (withoug  deleting)  the  i-th  element  of  the 
set  s,  where  i  is  an  integer.  Since  this  depends  on  the  arbitrary  (but 
fixed)  ordering  of  elements  in  the  implementation  of  the  set,  it  has  little  use, 

'RANDEL(s)'  will  return  a  randomly  chosen  element  from  the  set   s,   with- 
out deleting  it. 

1.3.5.6.   Name  Functions  (Intermediate  Results) 

The  name  functions  in  this  section  all  perform  some  operation  on  the 
input  sets  and  store  the  result  in  the  AC.   The  name  returned  is  always  AC. 
The  input  sets  remain  unchanged  (unless  one  of  them  is  AC). 

'UNION  (si,  s2)'  takes  the  union  of  sets   si  and  s2. 

'INTER  (si,  s2) '  takes  the  intersection  of  sets   si   and   s2. 

'COMPL  (s)'  takes  the  complement  of  the  set   s  with  respect  to  the 
universal  set  of  unrestricted  objects  (useless  by  itself). 

'DIFF  (si,  s2)'  contains  those  objects  belonging  to   si   but  not  to   s2 . 

'SYMDIF  (si,  s2) '  contains  those  objects  belonging  to  exactly  one  of 
the  sets   si   and   s2   (exclusive  or) . 

1.3.6.   Saving  Object  Values 

'CALL  PUSH  (x)  ; '  saves  the  value  of  the  object  named  x. 
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'CALL  POP  (x)  ;'  restores  the  saved  value  of  the  object  named  x.   For 
example,  consider  the  following  code: 


CALL  PUSH  (x); 
2 


CALL  POP  (x) ; 
4 


The  values  (of  all  the  parts)  of  the  object  named  x  will  be  the  same  at 
points  1,  2,  and  4  regardless  of  the  values  at  point  3.   The  definition  of 
x  remains  unchanged  throughout. 

As  the  words  'push'  and  'pop'  imply,  any  number  of  copies  of  an  object 
may  be  saved  in  this  way,  and  restored  in  the  usual  'last  in  -  first  out' 
order.   Implementation  restrictions  will  limit  the  number  of  saved  objects 
at  any  point  during  execution. 

1.3.7.   I/O 

GASP  does  not  aid  the  user  in  the  input  of  sets. 

'CALL  PELEMSK  (s) ;  '  (Print  ELEMent  and  SKip  to  next  line)  will  print 
the  character  string  of  the  object   s  and  the  character  string  of  all  objects 
belonging  to  the  set  of   s. 
EXAMPLE: 

DCL  (SI,  S2,  El,  E2,  E3,  E4)  $NAME; 

$ALLOC  (SI)  ;  $ALLOC  (El);  $ALLOC  (E2); 

E3  =  El;  E4  =  E2;  S2  =  SI; 

$CSES  (S2,  E3);  $CHANGES  (S2,  E4 ,  ADD); 
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CALL  PELEMSK  (S2) ;  END; 
would  generate  the  output  line 

SI  =■  (El,  E2) 
and  skip  to  the  next  line. 

*$PUT  (var-name);'  is  like  'PUT  DATA  (var-name) ; '  but  with  no  restrictions 
on  var-name. 

'CALL  ABDUMP;'  dumps  the  entire  data  base  (of  objects). 

Regular  PL/1  I/O  is  also  available. 

1.3.8.  Expanding  Operations 

'EXPAND2  (subr,  set)'  is  a  name  function  which  returns  AC.   Subr  may  be 
any  name  function  (e.g.,  UNION,  INTER,  SYMDIF)  which  takes  two  names  as  arguments 
and  performs  a  binary  (usually  associative  and  communtative)  operation  on  their 
sets,  returning  the  name  of  the  set  which  holds  the  result.   For  example,  if 
s  =  {el,  e2,  e3},   then  EXPAND2  (UNION, s)  is  equivalent  to  UNION(el,  UNION 
(e2,  e3)).   If   s   is  empty,  then  EXPAND2  (subr,  s)  is  empty,  and  if 
s  =  {el}   then  EXPAND2  (subr,  s)  =  (the  value  of  the  set)   el. 

'CALL  EXPAND1  (subr,  set);'  is  a  procedure  call  which  can  be  used  with 
any  procedure  subr  which  takes  one  name  as  input.   EXPAND1  will  call  this 
routine  with  set  as  the  argument,  and  then  will  call  it  with  each  object 
belonging  to  set  as  the  argument.   Useful  choices  for  subr  include  PELEMSK, 
PUSH,  and  POP. 

1.3.9.  Loop  Control 
'$FOR  (q,  x,  s);' 
code 

'$END; ' 
allows  code  to  be  executed  iteratively  with  each  iteration  having  a  different 
definition  of  the  name  x  chosen  from  the  set   s.   The  quantifier,   q,   may 
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be  any  number,  including  ANY  (equivalent  to  1)  and  ALL  (equivalent  to  cardinalit 
of  set   s).   Code  will  be  executed  minimum  (q,  ALL)  times.   's'  may  be  any  name 
or  name  function.   Once  the  $FOR  statement  is  executed,  changing   s  will  not 
affect  or  be  affected  by  the  iterations.   The  $FOR  -  $END  pair  is  a  PL/1  block, 
and  the  bound  variable  x  is  automatically  declared  within  this  block  (it 
need  not  be  declared  before). 

The  normal  exit  from  a  $FOR  -  $END  section  is  to  the  next  statement 
after  $END.   Any  other  jump  outside  must  be  expressed  as  'GO_TO  label  ;'. 

'$4ALLPAIRS  (bvl,  bv2,  s) ' 

code 

'$4APEND; * 
is  similar  to  the  $FOR  statement  except  that  code  is  executed  for  all  possible 
unordered  choices  of  bound  variables  bvl  and  bv2   subject  to  bvl,  bv2  e  s 
and  bvl  *  bv2.   Abnormal  exits  must  be  through  the  statement  'G0_2  label;'. 

1.4.   The  GASP  Statements  for  Graphs 

Nodes  will  be  denoted  by  n,  nl,  n2;   branches  by  b,   graphs  by  g. 

1.4.1.  Truth  Value  Functions 

'INCIDENT  (n,  b)'  answers  "is  n  incident  to  b?". 
'ADJACENT  (n,  n2) '  answers  "is  n  adjacent  to   n2?". 

1.4.2.  Simple  Information  Extraction 

To  get  the  set  of  adjacent  nodes  or  incident  branches  of  a  given  n  or 
the  set  of  incident  nodes  of  a  given  b,   use  the  following  name  functions  (all 
return  AC) : 

'SET_OF_INCIDENT  (NODES,  n) ' , 

'SET_OF_INCIDENT  (BRANCHES,  n) ' ,   or 

'SET  OF  INCIDENT  (NODES,  b) ' . 
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To  get  the  set  of  nodes  in  g,   use  the  name  function  (returns  AC) 

'$NOF(g)'.   Similarly,  the  branches  of  g  are  obtained  by  '$BOF(g)'. 

'CALL  GET_BAN  (b,  nl,  n2,  g)  ;'  defines  nl  and  n2  to  be  the  endpoints 
of  b  in  g. 

1.4.3.  Advanced  Graph  Operations 

'NBOUND  (nodeset,  g) '  is  a  name  function  which  returns  (AC)  the  set  of 
all  nodes  of  g  which  do  not  belong  to  nodeset  but  are  adjacent  to  at  least 
one  node  belonging  to  nodeset.   'BBOUND  (nodeset,  g)'  is  a  name  function  which 
returns  (AC)  the  set  of  all  branches  of  g  which  have  exactly  one  endpoint 
belonging  to  nodeset. 

'CALL  INTBANS  (s,  g)  ;'  is  a  procedure  call  which  returns  with  s  re- 
assigned the  value  of  the  subset  of  branches  of  g  which  have  both  endpoints 
belonging  to   s   (a  set  of  nodes  at  input  time). 

'D1ST  (nl,  n2,  g) '  is  an  integer  function  which  returns  the  distance 
from  nl  to  n2   in  g. 

'CALL  COLAPS  (b,  g)  ; '  is  a  procedure  call  which  changes  g  by  merging 
the  endpoints  of  b  into  a  single  node  and  removing  any  branches  connecting 
those  endpoints  (such  as   b). 

'CALL  DELBAN  (b,  g)  ;'  is  a  procedure  call  which  deletes  all  trace  of 
b   from  g. 

'CALL  DELNOD  (n,  g)  ; '  is  a  procedure  call  which  deletes  n  and  all  of 
its  incident  branches  from  g. 

1.4.4.  Graph  I/O 

'CALL  READGR  (g)  ;'  is  the  procedure  call  to  input  g.   The  input  format 
is  a  sequence  of  paths  of  node  numbers  (from  1  to  the  number  of  nodes) .   A  new 
path  is  begun  by  a  minus  sign  in  front  of  the  starting  node  [only  Euler  graphs 
can  be  given  by  a  single  path].   The  entire  sequence  is  terminated  by  a  zero. 
READGR  also  will  output  g   (see  below). 
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EXAMPLE:   Given  the  output  sequence 

1,  2,  4,  -2,  3,  4,  1,  0, 
READGR  would  create  the  graph  shown  in  figure  12. 


Figure  12 

'CALL  DEF_BAN  (b,  nl,  n2,  g)  ;'  is  a  procedure  call  which  will  create  a 
branch  b   connecting   nl   and  n2   in  g. 

' $PUTGRAPH  (g)  ;'  is  the  procedure  call  which  outputs  g  as  a  set  of 
nodes  and  branches.   It  is  equivalent  to  'CALL  EXPAND1  (PELEMSK,  g)  ;'. 

1.5.  Measuring  GASP  Programs 

A  count  of  the  number  of  executions  of  each  block  of  a  GASP  program  is 
accomplished  with  the  following  statements  [even  though  they  are  complete 
statements,  they  need  not  be  followed  by  a  semicolon]. 

'$DCLSTAT(k) '  declares   k   integers  to  be  used  for  counting. 

'$STAT'  is  placed  in  each  logical  block  to  be  counted. 

'$CLEARSTAT'  initializes  the  counts  to  zero. 

'$0UTSTAT'  prints  out  the  k  integers,  in  the  order  that  the  '$STAT"'s 
appeared  (compilation-wise,  not  execution-wise). 

1.6.  Implementation  Details 

1.6.1.   Data  Structure 

A  Universal  SET  (USET)  contains  all  GASP  objects.   USET  is  a  PL/1  struc- 
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ture  subdivided  into  $TSIZE  objects  (level  name  is  ELEMENT)  of  which  $SIZE  are 
unrestricted.   The  current  systems  has  $SIZE»64  and  $TSIZE»127,  but  these  can 
be  changed  easily.   The  set  part  (SSET)  of  an  object  is  a  bit  string  of  length 
$SIZE  (this  is  the  only  reason  for  restricted  objects:   an  increase  in  the  number 
of  restricted  objects  increases  the  memory  requirement  only  linearly;  an  increase 
in  unrestricted  objects  increases  memory  requirements  quadratically) .   The  other 
parts  of  an  object  are  CHARP  (CHARacter  string  Part),  INTP  (INTeger  Part),  REFP 
(REFerence  Part),  RP_2  and  RP_3  (Reference  Part  2  and  3).   PL/1  declarations  for 
the  various  data  types  [1.2.1.]  are  as  follows:   $CHAR  -  CHAR  (8),  $INTEGER  = 
BIN  FIXED  (15),  $NAME  =  BIN  FIXED,  and  $BIT  =  BIT  (1). 

1.6.2.  How  GASP  Works 

GASP  procedures  which  require  only  a  line  or  so  of  code  are  translated 
by  the  PL/1  preprocessor.  The  identifiers  which  are  translated  by  the  GASP 
macros  usually  start  with  a  '$'. 

Longer  GASP  procedures  are  incorporated  into  the  programs  as  separate 
PL/1  procedures.   The  user  has  a  choice  of  two  methods  which  include  these 
procedures  in  his  program.   The  more  efficient  way  is  to  include  them  as  pre- 
compiled external  procedures.   The  more  flexible  way  is  to  have  their  source 
code  inserted  into  the  main  program:   this  allows  the  user  to  set  the  limits 
$SIZE  and  $TSIZE  to  fit  his  needs. 

1.6.3.  Cost  Parameters 

Since  the  PL/1  preprocessor  and  PL/1  compiler  are  used,  compilation  time 
is  usually  large,  run  time  is  usually  reasonable.   For  example,  a  typical  program 
took  20  seconds  to  compile,  6  seconds  to  execute. 

The  core  requirement  for  the  basic  GASP  programs  and  data  is  around  120k 
bytes,  a  typical  program  might  require  a  total  of  150k  bytes. 
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GASP  macro  definitions  require  206  lines  of  PL/1  code,  the  source  code  of 
GASP  procedures  is  around  240  lines. 

1.6.4.   Implementation  Defects 

When  coding  a  binary  set  operation  (e.g.,  'UNION  (SI,  S2)'),  one  must 
make  sure  that  at  least  one  of  the  arguments  is  not  AC. 

'G0_T0'  and  'G0_2'  are  precompiled  into  more  than  one  PL/1  statement  and 
therefore  should  not  appear  immediately  after  a  'THEN' . 
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2.      THE  NEW  ALGORITHM  PROGRAMMED  IN  GASP 

NEW:  PROC(GtN);   DCL  G  $NAME,  N  SINTEGER; 

DCL  (  SET_PJt  XJ_NODES,  AJ,  XJ,  P(N)  )  $NAME, 

(  IS_DISCON,  D(N)  )  $BIT,    J        SINTEGER; 

%DCL  (  KJ,  ZJ,  SAVED,  $CARD  )  CHAR;   /*  USE  PARTS  OF  OBJECTS  */ 

%    KJ  =  •INTP(XJ)'  ;        %    ZJ  =  »REFP(AJ)«  ;  %    $CARD  =  «INTP'  ; 

%    SAVED  =  'RP_2'  ;  /*  RP_2  POINTS  TO  THE  SAVED  VALUE  OF  OBJECTS  */ 

IF  N  <  3  THEN  RETURN  ;  $ALLOC ( TEM P ) ; 

$ALLOC(XJ );  $ALLOC(SET_PJ);  $AL LOC ( X J_NODES )  5$ALLOC( A J  )  ;$ALLOC(YJ )  ; 
BOXl:   J  =  1  ; 

P(l)  =  NODES#(l);    $CSES  (  SET_PJ,  P(l)  )  ; 

SSTORES  (  XJ,  SET_OF_INCIDENT  (  BRANCHES,  P(l)))  *, 

SSTORES  (  YJ,  $BOF  (  G  )  )  ; 

D(  1  )  =  YES  ; 

IS_DISCON  =  NO  ; 
BOX2  :   $STAT    /*  COUNT  C2(G,  NEW)   */ 

IF  D(J)  THEN  DO;   $STAT 

SALLOC(ZJ);   $STORES(  ZJ,  XJ  )  ; 

$STORES(XJ_NODES,  DIFF(  EXPAND2 ( UN  I  ON , X J ) ,  SET_PJ  )  )  ; 
/*     NODE  BOUNDARY  (  P 1 ,  P2 ,  .  .  .  ,  P J  )   */ 

SCARD  (  XJ_NODES  )  =  CARD  (  XJ_NODES  )  J   END; 

KJ  =  0  ; 
BOX3  :   IF  -<CAN_PIC(  P(  J  +  l)  ,XJ_NODES)  THEN  SIGNAL  ERROR; 

$CARD  (  XJ_NODES  )  =  SCARD  (  XJ_NODES  )  -  1   ; 

$STORES  (  AJ,  INTER  (  XJ,  P(J+1)))  ; 

SSTORES  (  XJ,  DIFF   (  XJ ,  AJ  )  )  ; 
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SSTORES  (  YJ,  DIFF   (  YJ,  AJ  )  )  ; 

CALL  PUSH  (XJ)  ;   /*  X(J)  <-  X(J-l)   */ 

$STORES  (  XJ,  UNION  (  XJ,  I NTER ( Y J , P ( J+l ) ) ) )  ; 
/*  I.E.,X(J+1)<-   BOUND  (PI,  P2,  .  .  .  ,  P(J+1))  */ 
XJ_EMPTY:   IF  $EMPTY_(  XJ  )  THEN  DO;   $STAT 

CALL  POP  (  XJ  )  ;   GO  TO  DISCON  ;   END  ; 

SCHANGES  (  SET_PJ,P( J+l ),ADD  )  ; 

CALL  PUSH  (  YJ  )  ;   /*  Y(J+l)  <-  Y(J)   */ 

CALL  PUSH  (  XJ_NODES  )  ; 

IF  ->    D( J )  THEN  DO  ; 

SSTORES  (  TEMP,  DIFF  (  AJ,  ZJ  )  )  ; 

IF  -n  $EMPTY_(  TEMP  )  THEN  DO;   $STAT 

/*  MOW  PAY  THE  PRICE  FOR  INCORRECT  XJ   */ 

$STORES  (  AJ,  INTER  (  AJ,  ZJ  )  )  ; 

SSTORES  (  SAVED  (  YJ  ),  UNION(YJ,  TEMP)  )  ; 

SSTORES  (  SAVED  (  XJ  ),  UN  I  ON ( SA VED ( XJ ) ,  TEMP)  )  ;   END; 

D ( J  +  1  )  =  YES  ; 

GO  TO  BUMP  ;   END; 
/*  FLSE  IF  D(J)  THEN   */ 

IF  SCARD  (  XJ_NODES  )  >  0  THEN  D  (  J  +  1  )  =  NO  *, 

ELSE  DO;  $STAT    D(J+1)  =  YES  J 

SSTORES  (  AJ,  ZJ  )  ;   SKILL(ZJ)  J   END; 
BUMP:   KJ  =  KJ  +  l  ; 

CALL  PUSH(AJ)  ; 

J  =  J  +  1  ; 
J_EO_N  :   IF  J  <  N-l  THEN  GO  TO  BOX2  ; 
B0X4  :   SSTAT   /*  CMG,  NEW)  */ 
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IF  D(J)  THEN  $STORES  (  AJ,  YJ)  ; 

ELSE   SSTORES  (  AJ,  INTER(ZJ,  YJ)  )  ; 

/*  'OUTPUT  Al  X  A2  X  •  .  •  X  A(N-1)»  COMES  HERE  */ 
BOX5  :   J  =  J  -  1  ; 

CALL  POP(AJ);  CALL  POP(YJ);  CALL  POP(XJ);  CALL  POP ( X J_NODES ) ; 

SCHANGES  (  SET_PJ,  P(J+1),  DELETE  )  ; 

IF  IS_DISCON  THEN  GO  TO  DISCON  ; 
XJ_EMPTY_:   IF  $CARD  (XJ_NODES)  >  0  THEN  GO  TO  BOX3  ; 
J_EO_l  :   IF  J  =  1  THEN  GO  TO  RETURN_  ; 

GO  TO  BOX5; 
niSCOM  :  $STAT 

IF  O(J)  THEN  DO;   $KILL(ZJ);   $STAT   END; 

IS_DISCON  =  (  KJ  =  1  )  ; 

GO  TO  J_EO_l  ; 
RETURN_  :  $KILL(YJ);  $KILL(AJ);  $KILL(XJ);  $K I LL ( X J_NODES ) ; 

$KILL(TEMP);   $KILL(SET_PJ) ; 

RETURN;  END  NEW; 


6k 

VITA 

The  author,  Stephen  Martin  Chase,  was  born  in  Urbana, 
Illinois,  on  September  21,  19^3*  He  received  his  Bachelor  of  Science 
degree  in  Mathematics  in  June  1965,  and  his  Master  of  Science  degree 
in  Mathematics  in  June  1967  from  the  University  of  Illinois.  From 
June  1965  to  June  19J0,   he  was  a  research  assistant  in  the  Department 
of  Computer  Science  of  the  University  of  Illinois  at  Urb ana- Champaign . 
In  June  1970,  he  joined  the  research  staff  of  the  Thomas  J.  Watson 
Research  Center  in  Yorktown  Heights,  New  York. 


■■-■■ 


tf0* 


\&tt 


